Existing methods for large-scale point cloud semantic segmentation require expensive, tedious and error-prone manual point-wise annotations. Intuitively, weakly supervised training is a direct solution to reduce the cost of labeling. However, for weakly supervised large-scale point cloud semantic segmentation, too few annotations will inevitably lead to ineffective learning of network. We propose an effective weakly supervised method containing two components to solve the above problem. Firstly, we construct a pretext task, \textit{i.e.,} point cloud colorization, with a self-supervised learning to transfer the learned prior knowledge from a large amount of unlabeled point cloud to a weakly supervised network. In this way, the representation capability of the weakly supervised network can be improved by the guidance from a heterogeneous task. Besides, to generate pseudo label for unlabeled data, a sparse label propagation mechanism is proposed with the help of generated class prototypes, which is used to measure the classification confidence of unlabeled point. Our method is evaluated on large-scale point cloud datasets with different scenarios including indoor and outdoor. The experimental results show the large gain against existing weakly supervised and comparable results to fully supervised methods\footnote{Code based on mindspore: https://github.com/dmcv-ecnu/MindSpore\_ModelZoo/tree/main/WS3\_MindSpore}.
translated by 谷歌翻译
Establishing open and general benchmarks has been a critical driving force behind the success of modern machine learning techniques. As machine learning is being applied to broader domains and tasks, there is a need to establish richer and more diverse benchmarks to better reflect the reality of the application scenarios. Graph learning is an emerging field of machine learning that urgently needs more and better benchmarks. To accommodate the need, we introduce Graph Learning Indexer (GLI), a benchmark curation platform for graph learning. In comparison to existing graph learning benchmark libraries, GLI highlights two novel design objectives. First, GLI is designed to incentivize \emph{dataset contributors}. In particular, we incorporate various measures to minimize the effort of contributing and maintaining a dataset, increase the usability of the contributed dataset, as well as encourage attributions to different contributors of the dataset. Second, GLI is designed to curate a knowledge base, instead of a plain collection, of benchmark datasets. We use multiple sources of meta information to augment the benchmark datasets with \emph{rich characteristics}, so that they can be easily selected and used in downstream research or development. The source code of GLI is available at \url{https://github.com/Graph-Learning-Benchmarks/gli}.
translated by 谷歌翻译
Multi-instance learning (MIL) is a great paradigm for dealing with complex data and has achieved impressive achievements in a number of fields, including image classification, video anomaly detection, and far more. Each data sample is referred to as a bag containing several unlabeled instances, and the supervised information is only provided at the bag-level. The safety of MIL learners is concerning, though, as we can greatly fool them by introducing a few adversarial perturbations. This can be fatal in some cases, such as when users are unable to access desired images and criminals are attempting to trick surveillance cameras. In this paper, we design two adversarial perturbations to interpret the vulnerability of MIL methods. The first method can efficiently generate the bag-specific perturbation (called customized) with the aim of outsiding it from its original classification region. The second method builds on the first one by investigating the image-agnostic perturbation (called universal) that aims to affect all bags in a given data set and obtains some generalizability. We conduct various experiments to verify the performance of these two perturbations, and the results show that both of them can effectively fool MIL learners. We additionally propose a simple strategy to lessen the effects of adversarial perturbations. Source codes are available at https://github.com/InkiInki/MI-UAP.
translated by 谷歌翻译
安全与其他交通参与者的互动是自动驾驶的核心要求之一,尤其是在交叉点和遮挡中。大多数现有的方法都是为特定场景设计的,需要大量的人工劳动参数调整,以应用于不同情况。为了解决这个问题,我们首先提出了一个基于学习的交互点模型(IPM),该模型描述了代理与保护时间和交互优先级之间的相互作用以统一的方式。我们将提出的IPM进一步整合到一个新颖的计划框架中,通过在高度动态的环境中的全面模拟来证明其有效性和鲁棒性。
translated by 谷歌翻译
玻璃在我们的日常生活中非常普遍。现有的计算机视觉系统忽略了它,因此可能会产生严重的后果,例如,机器人可能会坠入玻璃墙。但是,感知玻璃的存在并不简单。关键的挑战是,任意物体/场景可以出现在玻璃后面。在本文中,我们提出了一个重要的问题,即从单个RGB图像中检测玻璃表面。为了解决这个问题,我们构建了第一个大规模玻璃检测数据集(GDD),并提出了一个名为GDNet-B的新颖玻璃检测网络,该网络通过新颖的大型场探索大型视野中的丰富上下文提示上下文特征集成(LCFI)模块并将高级和低级边界特征与边界特征增强(BFE)模块集成在一起。广泛的实验表明,我们的GDNET-B可以在GDD测试集内外的图像上达到满足玻璃检测结果。我们通过将其应用于其他视觉任务(包括镜像分割和显着对象检测)来进一步验证我们提出的GDNET-B的有效性和概括能力。最后,我们显示了玻璃检测的潜在应用,并讨论了可能的未来研究方向。
translated by 谷歌翻译
生成对抗网络(GAN)的适应旨在将预训练的GAN转移到具有有限培训数据的给定领域。在本文中,我们专注于单次案例,这在以前的作品中更具挑战性,很少探索。我们认为,从源域到目标域的适应性可以分为两个部分:全球样式(如纹理和颜色)的转移,以及不属于源域的新实体的出现。虽然先前的作品主要关注样式转移,但我们提出了一个新颖而简洁的框架\ footNote {\ url {https://github.com/thevoidname/generalized-onerized-one-one-shot-gan-adaption}},以解决\ textit {对样式和实体传输的一般性单发适应性}任务,其中提供了参考图像及其二进制实体掩码。我们的核心目标是通过切成薄片的瓦斯坦距离来限制参考文献和合成的内部分布之间的差距。为了更好地实现这一目标,首先使用样式固定来大致获得模范样式,并将辅助网络引入原始生成器以删除实体和样式传输。此外,为了实现跨域的对应关系,我们提出了变异的拉普拉斯正则化以限制适应性发生器的平滑度。定量和定性实验都证明了我们方法在各种情况下的有效性。
translated by 谷歌翻译
在大规模统计学习中,子采样或子数据选择是一种有用的方法。大多数现有研究的重点是基于模型的亚采样方法,这些方法显着取决于模型假设。在本文中,我们考虑了从原始完整数据中生成子数据的无模型亚采样策略。为了衡量subdata在原始数据方面的表示优点,我们提出了一个标准,广义的经验F-歧义(GEFD),并研究其与经典的广义L2票有关的理论特性统一设计。这些属性使我们能够根据现有统一设计开发一种低GEFD数据驱动的子采样方法。通过仿真示例和实际案例研究,我们表明所提出的亚采样方法优于随机抽样方法。此外,我们的方法在不同的模型规范下保持稳健,而其他流行的亚采样方法的表现不佳。实际上,这种无模型的属性比基于模型的亚采样方法更具吸引力,在我们的仿真研究中证明,后者的性能可能较差。
translated by 谷歌翻译
由于其广泛的应用,例如自动驾驶,机器人技术等,认识到Point Cloud视频的人类行为引起了学术界和行业的极大关注。但是,当前的点云动作识别方法通常需要大量的数据,其中具有手动注释和具有较高计算成本的复杂骨干网络,这使得对现实世界应用程序不切实际。因此,本文考虑了半监督点云动作识别的任务。我们提出了一个蒙版的伪标记自动编码器(\ textbf {Maple})框架,以学习有效表示,以较少的注释以供点云动作识别。特别是,我们设计了一个新颖有效的\ textbf {de}耦合\ textbf {s} patial- \ textbf {t} emporal trans \ textbf {pert}(\ textbf {destbrof {destformer})作为maple的backbone。在Destformer中,4D点云视频的空间和时间维度被脱钩,以实现有效的自我注意,以学习长期和短期特征。此外,要从更少的注释中学习判别功能,我们设计了一个蒙版的伪标记自动编码器结构,以指导Destformer从可用框架中重建蒙面帧的功能。更重要的是,对于未标记的数据,我们从分类头中利用伪标签作为从蒙版框架重建功能的监督信号。最后,全面的实验表明,枫树在三个公共基准上取得了优异的结果,并且在MSR-ACTION3D数据集上以8.08 \%的精度优于最先进的方法。
translated by 谷歌翻译
通过恢复(实体瘤的响应评估标准)自动测量病变/肿瘤大小,直径和分割对于计算机辅助诊断很重要。尽管近年来已经研究了它,但仍有空间可以提高其准确性和鲁棒性,例如(1)通过合并丰富的上下文信息来增强功能,同时保持高空间分辨率,(2)涉及新任务和损失以进行关节优化。为了实现这一目标,本文提出了一个基于变压器的网络(Meaformer,测量变压器),用于病变恢复直径预测和分割(LRDPS)。它被配制为三个相关和互补任务:病变分割,热图预测和关键点回归。据我们所知,这是首次使用按键重点回归进行恢复直径预测。 MeaeFormer可以通过使用变压器来捕获其远程依赖性来增强高分辨率功能。引入了两个一致性损失,以明确建立这些任务之间的关系,以更好地优化。实验表明,MeAformer实现了LRDP在大规模深层数据集上的最新性能,并在纵向研究中产生了两个下游诊所的任务,即3D病变细分和恢复评估。
translated by 谷歌翻译
图像目标导航是一项具有挑战性的任务,因为它要求代理必须导航到以前看不见的场景中图像指示的目标。当前方法介绍了各种存储机制,这些记忆机制可以保存导航历史记录以解决此任务。但是,这些方法使用内存中的所有观察值来生成导航操作,而无需考虑该内存的哪一部分是有益的。为了解决这一限制,我们提出了Memonav,这是一种用于图像目标导航的新型内存机制,该机制保留了代理商的短期记忆和长期记忆,以改善多进球任务上的导航性能。代理拓扑图上的节点功能存储在短期内存中,因为这些功能已动态更新。为了帮助短期记忆,我们还通过通过图形注意模块连续汇总短期内存来生成长期记忆。 MEMONAV通过基于变压器解码器的遗忘模块保留短期内存的信息部分,然后将此保留的短期内存和长期内存结合到工作内存中。最后,代理使用工作内存进行动作生成。我们在新的多进球导航数据集上评估了我们的模型。实验结果表明,MEMONAV的表现优于SOTA方法,而导航历史悠久的比例较小。从经验上看,结果还表明,我们的模型不太可能被困在僵局中,这进一步验证了Memonav通过减少冗余步骤来提高代理商的导航效率。
translated by 谷歌翻译